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EXECUTIVE SUMMAEY 



Historically, software testing was the process of exercising a computer 
program to verify that it performed as required and expected. The strategic goal 
of software testing was to demonstrate correctness and quality. We now know that 
this view of testing is not correct, lasting cannot produce quality software, nor can 
It confirm correctness, lasting can only verify the presence (not the absence) of 
software defects. Yet, the difficulty of testing and the impracticality of correctness 
proof have often driven us to the dangerous percq>tion that if testing does not find 
defects, then the software is correct 

In the early 1980s» software testing concepts were neither well-developed nor 

well-understood [1. p.39]. While testing techniques were many, supporting 

theories were few. Even worse, litde or no guidance existed for making intelligent 

choices of technique(s) [2, vol. 1, p. 24]. During the 1980s, Department of 

Defense (DoD) and industry gathered much enq>irical evidence to justify many 

software quality and software development techniques. As a result, the scope of 

software testing has evolved into an integrated set of software quality activities 

that cover the entire life cycle [3]. Software tests now take different forms and 

apply to all software products including requirements, design, documentation, 

test plans, and code. Each test contributes to a total quality assurance plan. 

Quality assurance focuses on the front of the development process and 

emphasizes defea prevention over detection. A cost-effective prevention 

program first requires accurate error detection and analysis to understand where, 

how, and why defects are inserted. Though testing cannot prevent errors, it is the 

most important method for producing error data necessary to guide process 

improvement. However, the following extract from the 1992 Software 

Maintenance Technology Reference Guide [4] sunmiarizes the difficulty of 
testing: 

"Software implementation is a cozy bonfire, warm, bright, a bustle of 
comforting concrete activity. But beyond the flames is an immense zone 
of darkness. Testing is the exploration of this darkness. " 

The conclusions of this report are not revolutionaiy, but they may be 
surprising. DoD knows how to produce quality software. There are a few 
contraaors who produce quality software, (though not necessarily for DoD) using 
many of the policies published in DoD Standards. These documents describe the 
need to focus on quahty activities early in the software life cycle. Developers and 
verifiers should identify and remove errors during requirements definition and 
design so that they do not enter the code, wiiere finding and fixing defects is 
extremely expensive. For management information and command/control 



systems this is a particularly difficult task because most requirements for these 
systems are based upon human demands which are highly subjective, easily 
influenced, and thus, very dynamic and difficult to stete precisely. 

Although not in common practice yet for software development, quality 
control mc&ods adapted from the factory paradigm [5] may have the greatest 
potential to move software production from an art to a true engineering discipline 
[6, 7, 8, 9]. Both the products and the development process should be subjected to 
these procedures. To engineer quality into the software products requires that we 
inspect/test and remove defects from requirements, design, documentation, code, 
test plans, and tests. Quality control of the development process requires that we 
establish standard procedures to measure defects, determine their root causes, 
and take action to prevent future insertion. Such a process is self-correcting, and 
ftiture measurements will provide convincing evidence of cost-effective 
improvement In summaiy, software quality improvement is evolutionary and 
requires that we control, coordinate, and feedback into diree concurrent 
processes: the software development process, the error detection process (testing 
life cycle), and the quality improvement process. Figure 1 depicts die 
relationships between the processes in the software life cycle. 
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A few corporate organizations have successfully implemented these 
procedures [10, 11, 12, 13, 14, 15, 16, 17]. The common key element in these 
successes is organization-wide commitment to a quality attitude and disciplined 
life cycle procedures. However, within DoD the perception persists that such 
practices are not cost-effective. Singly mandating their use has not been 
adequate. Even if enforced, the techniques can be undermined, and neither 
software quality nor the perceptions will change [10]. DoD must jump-start these 
procedures with an active campaign to estabUsh and nurture a quality attitude 
both internally and in its contractors. IBM Federal Systems Company (FSC) 
Houston took 15 years to refine .their processes into producing high quality 
software. But, it also believes that other organizations can learn from their 
procedures without investing such time. What can make this possible is the fact 
that their procedures already correlate well with written DoD policies, the policies 
of other corporate software developers, and the recommendations of academia. 
The difference is that IBM has disciplined itself to practice them. DoD should 
take advantage of this knowledge and experience now, and adapt its own practices 
accordingly. 

In order to initialize the production of higher quality software within DoD, we 
reconunend the following actions: 

(1) Actively motivate a software quality attitude in DoD and government 
contractors through management commitment, incentives for process 
improvement and quality, and technical training. Make quality as visible as the 
software product, its cost, and its schedule. For every change to software product, 
cost, or schedule, DoD project managers must give equal consideration to the 
corresponding cost of and effect on quality. 

(2) Motivate and make standard the use of formal inspections for all 
software products (requirements, documentation, design, code, test plans, tests). 

(3) Users, developers, and verifiers should jointly analyze requirements to 
ensure they are clearly documented, implementable, and testable. The formal 
analysis of quality objectives should be an integral part of this effort. A joint 
relationship should continue throughout the software life cycle. Eventually, this 
effort should result in documentation or data that directly cross-references test 
cases to requirements and code. At the same time, both developer and verifier 
should independentiy plan, design, develop, inspect, execute, and analyze the 
results of software tests. 

(4) Measure and document errors throughout the life cycle. Establish a 
formal defect prevention program which empowers developers and verifiers to 
analyze the causes of error and enact improvements to their own local 
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development processes that will prevent future error insertion and enhance 
detection processes. 

(5) Evolve Computer-Assisted Software Engineering (CASE) tools to 
support all aspects of software development, testing, and maintenance. DoD 
should permit organizations to introduce standard CASE tools gradually in 
piece-meal foshion. An organization should purdiase, train, and employ only 
those tools for which its sub-processes are defined in writing. Start small and 
allow adequate time to learn and gain experience. Purchase and integrate a new 
tool only when users understand the manual procedure the tool will automate, 
and the benefit of automating it. 

With regard to software testing in DoD, we can summarize our condusions in 
two fundamental ideas. First, DoD knows how to produce quality software at low 
cost. This is because organizations such as DoD STEP, Army STEP, and Software 
Engineering Institute have already researched and documented policies for DoD. 
A few commercial software developers practice many of the DoD policies and 
directives now, and produce quality software (for example, IBM FSC Houston). 
Second, quality cannot be tested into software. Only a well-defined, 
well-disciplined process with a continuous inq>rovement cycle can ensure 
software quality. However, testing cannot be underestimated. Systematic testing 
activities diat detect error earliest in the life cycle are necessaiy to drive process 
improvement and optimize the development of quality software. Such testing 
methods as formal inspection find defects early. This enables cost-effective error 
resolution, identification and removal of defect causes, and thus, prevention of 
future defect insertion. If practiced with discipline, such methods can evolve a 
self-correcting software development process that is stable, modeled, measured, 
and therefore, predictable. This development process engineers quality software 
faster at reduced cost. 

This report discusses software testing practices, and more specifically, why 
and how IBM's practices achieve high quality. Along the way, we will relate DoD 
policies, instructions, and guidance to IBM's practices. We will also discuss 
current initiatives within DoD which will inq>act software testing and quality. 
Finally, we present our specific reconunendations for software testing and quality 
within DoD. We believe that these reconunendations have the potential for 
inmiediate value to DoD. 
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Adimnistration (NASA), currently approadies .01 errors per thousand lines of 
source code [10]. This figure is well below the U.S. industry average. Surprisingly 
enough, there is nothing new or revolutionary about the way that IBM FSC 
Houston develops or tests its software. Maiiy of the same methods are used at 
IBM FSD Rodcville, as well as at other large software development corporations. 
IBM FSC Houston practices basic software life cycle processes, most of vMch 
have been known for at least a decade. These include requirements analysis, 
formal inspections, configuration control, quality control, developmental testing, 
and independent verification and validation. 

So, why does IBM FSC Houston produce such high quality software? The 
difference results from a strong attitude toward quality, the disciplined practice of 
its basic processes, and a commitment to process improvement. From manager to 
programmer, the entire organization strives to adiieve zero-defects through 
prevention, lb the classical waters model of software development, this 
organization applies basic testing processes designed to identify errors as early as 
possible. Once identified, defects in the software products are corrected. 
However, continuous measurement, causal analysis, and subsequent cause 
removal improves the development process and prevents future error insertion. 
Their techniques are^very closely related to concepts of Total Quality 
Management (TQM) [22] and the software -factory paradigm [5]. Recent 
empirical evidence in other organizations [10, 11, 12, 13, 14, 15, 16, 17] confirms 
the effectiveness of the software quality tedmiques practiced by IBM FSC 
Houston. Later, we will discuss the techniques in more detail and relate them to 
the DoD environment. 

Critics maintain that because of fundamental differences, software 
techniques used to develop and test weapons systems cannot be used efifidently or 
effectively to produce information systems (and the reverse). IBM also believed 
this until the late 1980s. However, on the basis of its own success in developing 
hi^-quality flight control software, IBM FSC Houston began to develop its 
ground system software (essentially MIS) using the same methods. The empirical 
evidence speaks for itself. Error rates in delivered ground software decreased 
dramatically to the same levels achieved in flight software. Furthermore, this 
similarity in quality occurs in spite of the more extensive testing that safety critical 
flight software undergoes [10]. The quality achieved with early error detection 
and prevention techniques is largely independent of the type of software being 
developed. 

The development of large DoD Management Information Systems (MIS) and 
Command/Control Systems (C2) software is costly and rime-consuming. Much of 
this cost and rime can be attributed to the identification and repair of errors. 
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Closely related defect repair is maintenance — re-work necessitated by dianging 
requirements or latent defects. In such cases, software in operation must be 
modified to reflert new requirements or requirements that were initially 
ill-defined. To reduce the cost and time to produce and maintain software, DoD 
must avoid passing immature software to the testing phase, or worse, to the 
customer. 



Tfesting is one of the most important quality tools. Properly applied, testing 
helps to identify one of the greatest impediments to quality — error. However, if 
quality software is the ultimate goal, then any discussion of effective softw^e 
testing must address the entire software life cycle. This is because testing alone 
can neither produce nor guarantee software quality. Tfesting only finds faults; it 
cannot demonstrate (in a practical sense) that faults do not exist What we have 
traditionally thought of as software testing tends to be labor-intensive, costly, and 
ineffective. This view of testing is a paradox. Tfesting is a process that instills 
confidence in software by cleverly plotting to undermine that confidence [23]. 
Nevertheless, there is empirical evidence to suggest that old concepts of quality 
controlcancounterthisview. By expanding the concepts and practices of software 
testing to aU areas of the life cycle, we can optimize test efforts, increase its 
effectiveness, and significantiy reduce its cost. The result wiU be the deliveiy of 
higher quality software on schedule for less money. 

One reason for general difficulty in testing software appears to stem fi-om 
differences of testing models conceived in the minds of users, managers, 
developers, analysts, and testers [3]. Without common accepted concepts, aU vital 
communication in large software development projects will amount to 
assumptions and guesswork in the best case. Therefore, in order to clarify ftirther 
discussion, we summarize several fundamental definitions fi-om the ANSI/IEEE 
Glossary of Software Eng^eering Terminology [24], considered industry standards: 

error - a discrepancy in implementing requirements or design specifications. An 
error may manifest itself as incorrect or undesired results. 

fault - a defect in code that has the potential to cause (possibly visible) incorrect or 
unexpected results. Faults are also known as ^rugs. Faults in code usually 
result from errors. 

debugging - the process of locating, analyzing, and correcting suspected faults. 

faUure - the execution of software fault or defect that manifests itself as incorrect 
or undesired results. 
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testing' the process of exercising or evaluating a system or system components by 
manual or automated means to verify that it satisfies specified requirements 
or to identify differences between expected and actual results. 

dynamic analysis - testing by executing code. 

static analysis - the process of evaluating a computer program widiout executing it; 
e.g. review, desk check, inspection, walk-through. 

correctness - use of tills term usually means tiie composite extent to which: 

(1) design and code are free from faults 

(2) software meets specified requirements 

(3) software meets user expectations 

verification - 

(1) tiie process of determining whether or not die products of a given phase 
of die software development cycle fulfill tiie requirements established 
during the previous phase. 

(2) formal proof of program correctness. 

(3) die act of reviewing, inspecting, testing, diecking, auditing, or otiierwise 
establishing and documenting whetiier or not items, processes, services, or 
documents conform to specified requirements. 

validation - die process of evaluating software at die end of die software 
development process to ensure compliance widi software requirements. 

Several of tiiese terms have subde relationships and differences in meaning. 
It is important to recognize that errors relate to early phases of die life Qrde — 
requirements definition and design specification. An error in requirements or 
design causes die insertion of a fault into code. However, a fault may not be 
visible during code execution, yAkethet during testing or operation. If a fault is 
executed, dicn it may result in a visible failure (but not necessarily). Programmers 
debug code to correct faults by using visible failures as a guide. However, the lack 
of failures cannot guarantee die absence of faults. Even if die fault executes, it 
may not be visible as output. Furdiermore, fault correction does not necessarily 
imply diat the error(s) diat induced die fault has been corrected. 

ftom die above discussions, one should conclude diat effective software 
testing cannot be limited to code. It must address all products of die software life 
cycle. The definition implies that testing demonstrates: 

(1) diat the code satisfies a specific requirement. 

(2) whether faults exist in the code. 
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However, these are only the ideal goals of testing. In practice, they cannot be 
adiieved in the absolute sense. Hirdiennore, these goals are not necessarily 
mutually exclusive. Code can often satisfy user requirements (as defined) and still 
contain faults. The definition can easily convey the erroneous perception that 
testing can verify correctness. Correctness is a major factor in software quality, 
and by definition, relates to code, requirements, and user expectations. But, 
testing code only verifies the presence (not absence) of faults in code, and cannot 
verify correctness or ensure quality, lasting code can verify the presence of 
requirements only if they are defined precisely as test cases. Developers and users 
do not normally view requirements in this manner. Effective testing identifies 
errors before they become code faults, and therefore, must apply to the entire life 
cycle. 

Since the 1980s, the scope of software testing has expanded to cover the entire 
life (^e [3]. Empirical data ftom software projects in the last decade provides 
convincing evidence that testing in this context can significantly improve software 
quality. In its current model, software testing has a variety of forms that apply to a 
range of products including requirements, design specifications, documentation, 
test plans, as well as code. These techniques must be coordinated, disciplined, and 
integrated throughout the entire life cycle to effectively impact on quality. We will 
make the case that to have maximum positive effect on a large software project, 
testers must participate in development and gain a broad understanding of the 
software requirements and design. Hierefore, in the remainder of this report we 
will refer to software testing professionals as verifiers to highlight their e?q>anded 
roles consistent with the definitions above. 

3. Involve Verifiers in the Entire Development Life Cycle 

DoD STEP reports [25. vol. 3] indicate that the most successftil DoD software 
projects established independent test and evaluation organizations. Sometimes 
these organizations were separate independent contractors. Other times they 
were sub-organizations under the prime contractor, but having an independent 
chain of conmiand. This is an effective strategy which is in common practice to 
help ensure objective, impartial, and unbiased testing. DoD directives provide for 
such independent testing activities. Each military service has its own independent 
test and evaluation organization. General IBM testing policies also define the 
need for such. Both IBM FSD Rockville and IBM FSC Houston have 
independent verification organizations within their respective projects. 

The advantages of independent testing should not overshadow the need for 
communication and coordination between verifier, user, and developer. While 
verifiers should plan, design, implement, and analyze software tests 
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independently, they should not do so in isolation. Verifiers who must design and 
perform operational tests cannot gain adequate understanding of the 
requirements of a large software system by studying the system documentation 
after development. They must take an active role in the requirements definition 
and system design phases. 

The biggest mistakes in software are almost always made early during 
requirements definition and design [26]. Empirical evidence indicates that the 
cost of fixing errors versus time in development is an ejqxmentially rising curve. 
IBM FSC Houston data shows that average error repair costs increase 10 times in 
each successive phase of the life cycle [10]. As a result of such data, in the 
mid-1980s, IBM FSC Houston dedded to move 30% of its resources used in 
testing of code to assist in the requirements definition and design phases. This 
decision resulted in a significant increase in software quality. Furthermore, this 
shift resulted in a net decrease in total cost Shell Researdi reported similar 
results [12], The conclusion is obvious — verifiers should participate in 
requirements analysis, definition, and design. It is far dieaper to find and fix 
errors before they become faults in the code. 

DoD STEP identified die need for early test and evaluation activities in 
software development It also identified the need for integration of independent 
verification organizations. One result of the DoD STEP recommendations is that 
DoD Instruction 5000.2 states "Both developmental and operational testers shall 
be involved early..." Army STEP has further defined procedures for dose 
coordination between verifiers, users, and developers. The new DA Pam 73-1 
Volume 6, Software Test and Evaluation Guidelines [27] describes how software 
testing and evaluation activities relate to each phase of the software life cycle. The 
adoption of all or portions of DA Pam 73-1 into DoD instructions and directives 
could reinforce and more precisely define die communications that should occur 
among verifiers, developers, and the customer. 

The Air Force Standard Systems Center (SSC) at Gunter Air Force Base in 
Alabama, takes customer involvement seriously. Some of their standard 
information systems development work is contracted to local software firms. 
However, as dictated by the terms of the contracts. Government personnel are 
participating members of contractor developer and verifier teams. While this has 
caused a few unusual and difficult situations, die overall strategy appears to work. 
SSC anticipates that the result of these contracts will be well-defined 
requirements and design, better quality software, and systems that are more easily 
maintained after acceptance [28]. This SSC practice could be a model for DoD 
contracted software development. 
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DoD Standard 2167A 119] clearly requires traaable and testable 
requjrements. -naoeable requirements are defined and fonnulated such that 
direct cross-referenang exists among requirements, design specifications, code 
and test cases. TteceabiKly also implies that each requirement can be 
nrp^ementedm bod, design and code. Ansqmrementistestableifandonlyifitis 
wntten so duit developers and verifier can prepare specific test cases that can 
dearly confirm satisfaction of the specific requirement. 

At both IBM FSD RockviUe and IBM FSC Houston, developers and verifiers 
work togetfier to ensure that requirements are both traceable and testable when 
defined In fact, test engineers for the Advanced Automation System (AAS) 

u u'^'^ ' "^^^ ^ automatically maintaiiis the 
reladonships between requirements and test cases. This tool is essentially a 
speaalired database management system that assists developers and verifiers in 
test management and configuration control. While such tools help to manage the 
relationships and maintain the consistenqr of d,e software products once 
developed, they camiot replace the difficult work nsquired beforehand to ensure 
traceabihty and testabihty. As practiced by IBM. success in this work is a direct 
result of close commmucations and coordination among the users, developers 

andvenfiers. IBM describes the relationship between itsdevelopetsand verified 
^ firendly-advenarial. IWs means that both groups work together wid, the 
customer toward a mutual understanding of the product requirements and design 
and Je early .denttfication of errors. Finding and preventing errors are 
considered pnmaiy job responsibiUties for both developers and verifiers At the 
ame time, each group independendy designs its respective testplansand casesfor 
later verification and validation. 

4. Formally Inspect All Softnaie Products 

c^T^^"^ ^^^^ '««*"g (i e. execution of 

^l^^T •'"^•'y '""^ « cost-effective. Yet, dynamic testing is 

essential to confirm software quality. Dynamic testing should Jplamied at the 
same time that requirements are analyzed and defined, and then executed 
^temaucal y as planned However, if quality is the objective, then verification 

^twaitforcodcEarlydetectiontechniquesmustbeappliedextensivelytoall 
software produce so that dynamic testing can be a cost-efficiem and graceful 
confirmation of functionality and quality. 

n.e analyses necessary to define implementablc. traceable, and testable 
requirements helps to avoid errors. However, one of the most effective early 
detection methods is the/oW inspection [29, 30] (also referred to as die Fa J, 
Inspecmn ]3 1 ]). Developed by Dr. Michael Fagan in 1976. die formal inspection 
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IS a general-purpose verification method. It is product-independent and can be 
employed to identify errors in requirements, design, documentation, test plans, or 
code. Thus, it has the potential to identify and permit removal of errors very early 
during Ac software life (yde. In feet, IBM FSC Houston reports that their 
application of formal inspections accounts for the identification of 80% of the 
errors in the U.S Space Shuttle flight software. Even so, the acceptance of formal 
infection into general practical use has been slow for several possible reasons. 
The technique has a reputation for being "low-tech." It requires a fair amount of 
intensive, detailed work [15], although it does appear that automated tools could 
enhance some of its procedures. The availability of good empirical data verifying 
its cost-effectiveness has not been available until the last several years. Even now, 
published results are not prevalent. At least one software corporation considers 
Its use of formal inspection procedures as a competitive advantage, and thus, 
declined to divulge their procedures [14]. 

A formal inspection is essentially a testing technique in \Auch a software 
product is formally examined by a team of e3q>erts. These experts include the 
author of the product and several of his/her peers. Depending upon tiie product, 
the team may also include a customer representative and a verifier. The primary 
objective of die team is to find as many errors as possible. In such a situation, 
finding errors must be considered in a positive sense, i.e. the team intensively 
scrutinizes the product (code or documentation), not the author*s abilities. The 
team's re^nsibility is to help die author(s) by identifying mistakes, thus 
preventing their entry into die next phase of the life cycle. This is done by 
paraphrasing lines or portions of the software product at a slightly higher level of 
abstraction or from a different perspective (such as from die verifier's view). The 
error detection efficiency of this process results from its formality and intensity. 
The procedures are defined and repeatable. Standard checklists ensure that 
common mistakes are not overlooked. 

As practiced by IBM FSC Houston, the formal inspection is the cornerstone 
of software verification and process improvement All software products must 
submit to and pass a formal inspection prior to acceptance into configuration 
control or submission for execution testing. Each product is examined by an 
inspection team tailored to that product. For example, the inspection team for a 
requirements defim'tion document will include the customer, a requirements 
analyst, a verifier, a programmer, as well as the author. The inspection team for 
the independent verifier's test plan will include die customer, a requirements 
analyst, and several verifiers. The inspection team for die developer's test plans 
will include several requirements analysts and programmer's. These particular 
examples illustrate how the tailoring of inspection teams establishes a cooperative 
yet independent relationship between developer and verifier. Each inspection 
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team indudes a senior peer who acts as the moderator. He must foster 
cooperation and focus on the objective — to find errors. Management strongly 
supports formal inspections, but does not participate in them. This ensures diat 
inaction results are used to rate the effectiveness of the technique and not the 
performance of individuals. Inspection teams record errors identified, and 
subsequendy, require authors to correct them. The requirement for 
re-inspection depends upon the severity and number of errors recorded. Error 
statistics from inspections of aU products and phases of the software life cycle are 
collected to measure process effectiveness. 

The advantages of formal inspections can be significant. Since formal 
inactions are reported to detect 80% of all errors, subsequent dynamic testing 
of code becomes more efficient. Fewer execution failures cause fewer 
intemiptions. This translates to additional time for more thorough testing, and 
possibly less time required for regression testing. Formal inspections and dynamic 
testing techniques compliment each other. Eadi can detect flaws that the other 
cannot[14, 15]. Execution testing detects faults and feilures, the manifestation of 
errors. On the otiier hand, forma! inq)ections detect the errors which potentially 
cause faults and failures. 

Besides enabling early and effective error detection for a range of software 
products, there are several indirect advantages of formal infections. At IBM they 
encourage thtfiiemiI)Hadversarial relationship between developers and verifiers 
through teamwork, cooperation, distributed risk, and consensus. Developers, 
verifiers, and the customer tend to focus effort on the most important aspects of 
software development — requirements and design. Time and cost required for 
testing and repair are diminished [32]. Formal inspections foster 
understandability and standardization in all software products. They provide 
excellent on-the-job-training for all participants since they teach technical 
standards and organizational culture [12]. Furthermore, formal inspections 
proliferate good ideas and eliminate bad approaches [31]. 

Formal inspections can require from 15% to 25% of total development time 
[32], so DoD developers may be reluctant to expend hmited resources to support 
them. However, the resources necessary to implement them are not as great as 
those necessary to find and fix errors later [10, 11, 12, 13, 14, 15]. Acost-benefit 
analysis at Shell Research [12] reported an average 30 hours of repair and 
maintenance time saved for every hour of inspection time invested. 
Bell-Northem Research [15] reported a 33:1 return. Other organizations have 
reported more conservative returns of 2:1, 6:1, and 10:1 [14], Note that these 
estimates are based entirely on direct costs. They do not include other possible 
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savings from indirect costs related to customer confidence and the avoidance of 
the consequences of operational failure. 

DoD Instruction 5000.2 spedfically states that DoD contractors should 
practice walk-throughs, inspections, or reviews of requirements, documents, 
design, and code [18]. Of these, the formal inspection is the most painstaking and 
work-intensive technique. Reviews and walk-throughs are also useful, but they 
have other goals, so they are less effective for detecting errors [32]. While use of 
formal inspections has demonstrated the production of high software quality at 
overall reduced cost and time [31], the earlier investment in cost and time can 
easily drive a decision not to employ them. This is apparendy because the 
consequences of quality are not as visible as those of cost and schedule in the early 
phases of the life cyde. We will discuss more about this later. The fact is that the 
resources expended to implement formal inspections can pay for themselves in a 
short time by removing more expensive testing and maintenance costs. 

Of the techniques we discuss in .this report, formal inspection appears to be 
the basis for the others. This technique stimulates, coordinates, and checks the 
developer/verifier coordinated requirements definition process. It does this by 
promoting teamwork and shared responsibility for quality. It also produces early 
defect data necessaiy to measure, feed, and guide process improvement. In 
addition to IBM, several other companies have described their experiences with 
the successful introduction of formal inactions. A few offer tips for overcoming 
the difficulties of instituting them [12, 15]. We summarize these tips as follows: 

(1) There must exist a belief that formal inspections will be effective. 
Dynamic code testing will always seem to be faster and more effective, but this is 
not true [15]. To change this mind-set will require an active campaign to sell the 
efficiency of formal inspections to all levels of the organization. Circulating 
reports of success and training programs can accomplish this. 

(2) Everyone must clearly understand formal inspection procedures. They 
are not informal. They are not cursoiy reviews, audits, or walk-throughs. Formal 
inspections are manual, intensive, detailed, and painstaking. Education and 
training are the best ways to prepare. 

(3) It is essential to have management support. Management must be 
decisive and committed to the belief that formal inspections will pay off. This 
requires that the cost of formal inspections be quantified, and resources be 
allocated to accommodate them into the schedule. Also, organizations must 
anticipate adjustments to the procedures as they adapt inspections to their own 
local environments. 
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(4) Early successes are critical, but also difficult to achieve. Start by 
inspecting only one or two types of documents (for example, requirements 
definition). The first products inspected may be riddled widi defects. Early 
inspections can easily become muddled in details until problems with standards 
and procedures are resolved. Therefore, good moderators who can maintain 
group momentum are essential in the early stages. 

(5) Keep detailed statistics on defect identification and associated actions. 
This data feeds process improvement and provides dear evidence of effectiveness. 
It will confirm belief in die process and strengthen commitment to it. 

(6) The best training for inspections is on-the-job training. However, 
continued formal training of inflection team moderators is particularly 
important Otherwise, as the effectiveness of the process becomes apparent to all, 
the amount of materials and die number of required inspections can over^ehn 
the best-planned schedules. 

(7) The local development process must be well-defined and understood 
by the participants. Otherwise, formal inactions will be ineffective. [31] 

5, Use Error Data to Guide Defect Prevention and Process Improvement 

Early identification and correction of errors is critical to software product 
correctness and quality. Correcting errors in software is a fix, but not a solution. 
Software errors are often the symptoms of a more fundamental process defect 
'Qpical process defects might be failure to follow a standard practice, 
misunderstanding of a critical process step, or lack of adequate training. In the 
software factory paradigm [5], the software development process is a special 
manufacturing system to which many traditional quality control principles apply. 
The developers and verifiers tiiemselves (owners of the process) use error data to 
measure and in^rove the process, until it readies a repcatable, predictable steady 
state. Based on prindples of Tbtal Quality Management (TQM), formal process 
improvement implements error prevention by removing die causes of errore 
within the development process and the causes of not finding these errors earlier 
in the detection processes. 

IBM FSC Houston practices process improvement. Developers and verifiers 
form small process evaluation teams (in TQM terms, process action teams) to 
analyze defects and identify then* causes. ITiese teams also determine how to 
remove defect cause, and subsequentiy implement required process changes. The 
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effectiveness of these teams is rooted in TOM. Team members are the analysts, 
programmers, and verifiers whose primary daily re^nsibilities are software 
development. Therefore, those who execute the development process also 
execute process improvement The key to success is total management support 
and encouragement The responsibility to analyze and execute rests with the 
developers and verifiers. The responsibility to allocate resources and make 
decisions that support process inq>rovement rests with tiie managers. 

The practice of process improvement has a number of positive outcomes. A 
process diat is partially or totally undefined will have to be defined in writing in 
order to subject it to process in^ovement Tliis further stabilizes the process and 
tends to make it repeatable. The continued practice of improvement defines clear 
procedures for change and enables gradual technology insertion. There is less 
resistance to new technology, because the implementors of change are the same 
people y^o suggest it At the very least, there will be a willingness to try new ideas. 

Another important advantage of process improvement is its built-in 
on-the-job training environment. Membership on a process evaluation team is 
an excellent first assignment for new personnel. This responsibility encourages 
immediate participation and teamwork, teaches the process definition and its 
change procedures, and stimulates creative thinking in the form of improvements. 
Newpersonnel are generally enthusiastic about contributing and bring fresh ideas 
into the organization. 

The long-term benefits of process improvement are also significant The 
procedures make the development process self-correcting. Therefore, over time, 
the number of errors inserted during each phase of software development 
decreases. This translates to a decrease in re-work and greater efficiency for all 
sub-processes. For example, verifiers may experience fewer problems during 
dynamic testing, because fewer (if any) serious errors exist that could interrupt, 
delay, or prevent test completion. 

Process improvement techniques are not new. As previously mentioned, they 
are essentially TQM techniques applied to the software development process. 
The SEI Capability Maturity Model (CMM) for the software development process 
contains procedures for process improvement [33 ] . Furthermore, the SEI Quality 
Subgroup of the Software Metrics Definition Working Group and the Software 
Process Measurement Project Tfeam have developed a draft framework for 
documenting software problems [34]. Sudi a standard collection mechanism can 
ultimately measure progress, enable estimations, and guide process 
improvement Dr. Michael Pagan, who originally developed formal inspection 
procedures [29], now trains developers and managers to improve their software 
productivity and quality with a three-step process — formal process definition. 
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life cycle. Included are evaluation checklists that are similar to those used by IBM 
in their formal inspections. The ideas in this framework have appeared in other 
D6D manuals, but only as guidelines. For example, they are contained in Draft 
Army Technicai BuUetin 18-102-2 (1985), and U.S, Army Information Systems 
Software Center BampMet 25-1 (1990), Software Quality Engineering Handbook 
[37]. In contrast to well-published results of formal inspections, Rome Labs' 
Software Quality Framework has received less visibility. However, die adoption 
and extensive use of very similar quality methods by NEC Corporation [38] and 
Metriqs, Inc. is evidence of its potential value [39]. 

DoD appears to have a more positive, pro-active attitude toward software 
quality. We believe that the establishment and support of the Army Software Test 
and Evaluation Panel (Army STEP) is very significant. The mandate to employ 
the Army STEP Metrics may be the first high-level action taken to implement 
software quality practices in the military. The Army's serious attention to metrics 
represents a significant shift by Army management toward software development 
as an engineering discipline. This also indicates management willingness to 
expend the resources for earty measurements to gain control of quality. We 
believe that DoD should take this opportunity to encourage, support, and 
motivate these efforts. Management supported metrics are a positive first step. 
However, these should not be collected for the sake of project management alone. 
Quality requirements should be formulated during the requirements definition 
phase. This could be accomplished using the Rome Labs Software Quality 
Framework. Once established, quality requirements should be measured through 
standard metrics and checked in detail through formal inspections using the 
checklists associated with the requirements. 

7. Introduce CASE Ibols to Support Weil-Defined Sub-Processes 

Because Computer Assisted Software Engineering (CASE) tools apply to all 
aspects of software development including testing, and because we have expanded 
the view of testing to encompass the entire life cycle, we discuss here the potential 
impaa of CASE technology on DoD, and its relationship to the techniques 
presented thus fax. 

The activities of software engineering which most impaa software quality 
(coordinated planning, formal inspection, configuration control, verification 
testing, and process improvement) should be repeatable, and yet, adjustable if 
they are to be effective. Often, the most efficient means of standardizing a process 
and making it repeatable is by automating it. Computer Assisted Software 
Engineering (CASE) tools do this for software development and testing 
processes. However, automating any process first requires that its procedures be 
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weil-defined, well-understood, and practiced. CASE technology cannot impose 
a methodology on an ad hoc software development environment [40]. Applying 
CASE tedmology in order to structure a manual process that is not working 
simply exacerbates a poor process. For example, without a weU-defined, 
well-understood manual procedure for configuration management, an 
organization should not ejpect to effectively control software configuration using 
a CASE tool. Automating a bad process only escalates a bad situation. Initially, 
DoD organizations should use CASE tools only for those processes which are 
well-defined and practiced. 

There are many examples of organizations which have adopted CASE tools 
only to abandon them because die anticipated inqnxyvements never materialized. 

One reason for this was described above. Another reason is an underestimation 
of the CASE tool training requirements [41]. Because CASE tools support 
metiiods, but do not impose tiiem, an organization must recognize the difference 
between learning the tool and learning the method the tool supports (especially if 
the supported method is not cuirentiy practiced!). There are learning curves 
associated with each [41]. If die method is understood and practiced, then only 
the tool presents a learning shortfall. However, if die tool supports a newmetiiod 
unfamiliar to die developers, then two shortfalls exist. TTie training and time to 
overcome such may add significandy to CASE tool investment. The compound 
training requirement (for botii method and tool) will also extend die time required 
to show a return on die investment. DoD should sensitize management to the 
large investment required to train personnel in bodi die new tools and, as 
necessary, the assodated new methods. 

The adoption of CASE technology within DoD should be an evolutionary 
process diat begins small and grows gradually. Controlled institution of CASE 
tools has a greater potential for immediate success and visible investment return 
dian a massive, overwhelming introduction. The adoption of standard software 
process metrics gives DoD die means to establish die value of CASE tools. DoD 
should not force its managers to overreact Rather, it should make die time and 
financial resources available for managers to adequately train and mediodically 
insert standard CASE tools into practical use. Botii CASE tool users and 
management must restrain expectations of immediate results. They must 
anticipate the learning curve(s), measure progress, and continue with process 
improvement to insert additional CASE technology. 

Our observations of and discussions widi IBM FSC Houston strongly support 
this approach to introduction of CASE tools. Developers at IBM FSC Houston 
have produced and continue to produce high quality software using manual 
processes (supported by non-integrated databases and word processing tools). 
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CASE tools are just now being considered. However, deployment of CASE tools 
will be closely monitored and controlled through formal process improvement 
[42]. This will ensure that the adoption of tools will properly enhance their own 
methods. CASE technology will probably cause changes (improvements) in their 
current methcxls, but only through the fonnal unprovement process. We highly 
recommend that DoD consider process unprovement as the technology insertion 
mechanism for instituting CASE tools. 

S. Condusions and Recommendations 

DoD knows how to produce quality software. The Army has recognized the 
critical relationship between measurement and quality control, and is now 
implementing mechanisms that can decrease the cost and increase the quality of 
software production. Software testing is related to these mechanisms as a key 
data-gathering technique. However, quality cannot be tested into software. 
Quality must be designed through engineering. Engineering requires an 
expanded view of testing to maximize the effectiveness and reduce the cost of 
verification and acceptance testing. DoD testing activities should begin during 
requirements definition and should influence the entire software development life 
cycle. 

The general principles of engineering applied by IBM and other conrnierdal 
companies to produce quality MIS, C2, or MSCR software are the same — 
modeling, standardization, measurement, and process control. However, the 
empirical evidence has come from environments in which the developing/ 
testing/maintaining organization, the software, and the customer have been 
relatively constant for a long period of time. For example, IBM FSC Houston has 
developed and tested the Space Shuttie flight and ground software for NASA for 
over 15 years, adequate time to stabilize their development process. However, 
real cost savings have been reported even in the case these procedures were 
initiated during the verification phase of a project [31]. Nonetheless, our 
recommendations should be viewed in the context of the DoD environment. 
There exists a variety of contractors, customers, software, and relationships 
among them. Detailed standardization of software engineering activities would 
be too restrictive and probably counter-productive. Instead, DoD guidelines 
should be standardized locally through detailed, written procedures. 

On the basis of our discussions and conclusions, we recommend the following 

actions: 

(1) Actively motivate a software quality attitude in DoD and government 
contractors. Implement by adopting and training a process (such as the Air Force 
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APPENDIX - Ust of Acronyms 



AAS Advanced Automation System 

AIRMICS Army Institute for Research in Management Information, 
Communications, and Computer Sciences 

ANSI American National Standards Institute 

Army STEP Army Software Tfest and Evaluation Panel 

C2 Command and Control 

C3I Command, Control, Communications, and Intelligence 

CASE Computer Assisted Software Engineering 

CMM Capability M aturiQr Model 

DoD Department of Defense 

DoD STEP DoD Software Tfest and Evaluation Projea 

FSC Federal Systems Company 

FSD Federal Sector Division 

IBM International Business Machines Corporation 

IEEE Institute of Electrical and Electronics Engineers 

IV&V Independent Verification and Validation 

MIS Management Information System 

MSCR Materiel System Computer Resource 

NASA National Aeronautic and Space Administration 

OASD Office of the Assistant Secretary of Defense 

SEX Software Engineering Institute 

SSC Standard Systems Center 

TQM Total Quality Management 
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